1/83 ADD ISSUE 1 DOCUMENT PROCESSING GUIDE 


II. DOCUMENT PREPARATION 
ADVANCED EDITING 
1. Introduction 


The advanced editing part is meant to help UNIX operating system users (secretaries, typists, program- 
mers, etc.) make effective use of facilities for preparing and editing documents, text, programs, files, ete. It pro- 
vides explanations and examples of: 


@ special characters, line addressing, and global commands in the text editor (ed) 


e commands for “cut and paste” operations on files and parts of files, including mv, ep, eat, and rm com- 
mands, and r, w, m, and t commands of the text editor 


e editing scripts and text editor-based programs like grep and sed. 


Although this document is written for nonprogrammers, new UNIX operating system users with any back- 
ground should find helpful hints on how to get their jobs done more easily. The UNIX operating system provides 
effective tools for text editing, but that by itself is no guarantee that everyone will automatically make the most 
effective use of them. In particular, users who are not computer specialists (typists, secretaries, casual users) 
often use the UNIX operating system less effectively than they could. The reader should be familiar with the 
materialin the “Basics For Beginners” section of the User’s Guide— UNIX Operating System before using the 
text editor. Further information on all commands discussed here can be found in the User’s Manual—UNIX 


Operating System. 


Examples are based on experience and observations of users and the difficulties encountered. Topics covered 
include special characters in searches and substitute commands, line addressing, the global commands, and line 
moving and copying. There are also brief discussions on the effective use of related tools, e.g., those for file ma- 
nipulation and those based on ed. 


The next paragraphs discuss shortcuts and labor-saving devices. Not all will be instantly useful (some will) 
and others should provide ideas for future use. Until these things are used to build confidence, they will remain 
theoretical knowledge. 


Note: A document like this should provide ideas about what to try. There is only one way to learn to 
use something, and that is to use it. Reading a description is no substitute for hands-on use. 


2. Special Characters 


The ed program is the primary interface to the system, so it is worthwhile to know how to get the most 
out of it with the least effort. 


2.1 Print and List Commands 


Two commands are provided for printing contents of lines being edited. Most users are familiar with the 
print command (p) in combinations like 


1,$p 
to print all lines that are being edited, or 
s/abe/def/p 


to change “abc” to “def” on the current line and to print the results. Less familiar is the list command (1) which 
gives slightly more information than p. In particular, | makes visible characters that are normally invisible, 


Page 15 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


such as tabs and backspaces. If a line listed contains some of these, | will print each tab as “>” and each 
backspace as “<”’. This makes it easier to correct typing mistakes that insert extra spaces adjacent to tabs or 
a backspace followed by a space. 


The l command also “folds” long lines for printing. Any line that exceeds 72 characters is printed on multiple 
lines. Each printed line except the last is automatically terminated by a backslash (\) to indicate that the line 
was folded. A “$” character is appended to the real end of line. This is useful for printing long lines on terminals 
having output line capability of only 72 characters per line. 


Occasionally, the 1 command will print in a line a string of numbers preceded by a backslash, such as \07 
or \16. These combinations are used to make visible characters that normally do not print, e.g., form feed, verti- 
cal tab, or bell. Each such combination is interpreted as a single character. When such characters are detected, 
they may have surprising meanings when printed on some terminals. Often their presence means that a finger 
slipped while typing. ; 


2.2 Substitute Command 


The substitute command (s) is used for changing the contents of individual lines. It is probably the most 
complex and effective of any ed command. 


The meaning of a trailing global command after a substitute command is illustrated in the next two com- 
mands: 


s/this/that/ 
and 

s/this/that/g 
The first form replaces the first “this” on the line with “that”. If there is more than one occurrence of “this” 
on the line, the second form (with the trailing g) changes all of them. Either of the two forms of the s command 
can be followed by p or I to print or list the contents of the line: 

s/this/that/p 

s/this/that/1 

s/this/that/gp 

s/this/that/gl 


All are legal and have slightly different meanings. 


Ans command can be preceded by one or two line numbers to specify that the substitution is to take place 
on a group of lines specified by the line numbers. Thus: 


1,$s/mispell/misspell/ 


changes the first occurrence of “mispell” to “misspell” on every line of the file. The following command changes 
every occurrence in every line: 


1,$s/mispell/misspell/g 


By adding a p or I to the end of any of these substitute commands, only the last line that was changed will 
be printed, not all lines. How to print all the lines that were changed is described later. 


Page 16 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


Any character can be used to delimit pieces of an s command. There is nothing sacred about slashes (but 
slashes must be used for context searching). For instance, for a line that contains a lot of slashes already, e.g: 


//exec //sys.fort.go //etc... 
a colon could be used as the delimiter. To delete all the slashes, the command is 
8/:/2g 
2.3 Undo Command 
Occasionally, an erroneous substitution will be made in a line. The undo command (u) negates the last com- 
mand so that data is restored to its previous state. This command is useful after executing a global command 
if it is discovered the command did things that are undesirable. 


2.4 Metacharacters 


When using ed, certain characters have unexpected meanings when they occur in the left side of a substitute 
command or in a search for a particular line. These are called “metacharacters” which are: 


e Period 

e Backslash 
e Dollar Sign 
e Circumflex 
e Star 

e Brackets 

e Ampersand. 


Even though metacharacters are discussed separately in the following text, they can be combined. An example 
is given in the paragraph on “Circumflex” (2.4.4). 


2.4.1 Period 


The period (.) on the left side of a substitute command or in a search with /.../, stands for any single charac- 
ter. Thus the search 


/x.y/ 


ae? 


finds any line where “x” and “y” occur separated by a single character, as in 
x+y 
ay) 
X<sp>y 
x.y 
The <sp> stands for a space whenever needed to make it visible. 


Since the period matches any single character, a way to deal with the “invisible” characters printed by | 
is available. For instance, if there is a line that when printed with the 1 command, appears as 


Page 17 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


... th\0Tis ... 
and it is desired to get rid of the “\07” (the bell character), the most obvious solution is to try 
3/\0T// 


This will fail. The brute force solution, which is to retype the entire line, is a reasonable tactic if the line in ques- 
tion is not too long. However, for a very long line, retyping could result in additional errors. Since “\07” really 
represents a single character, the command 


s/th.is/this/ 


cory 
1 


gets the job done. The period matches the mysterious character between the “h” and the “i”, whatever it is. 
Since the period matches any single character, the command 
3/./,/ 


converts the first character on a line into a “,”. 


As is true of many characters in ed, the period has several meanings depending on its context. This line 
shows all three: 


s/././ 
e The first period is the line number of the line being edited, which is called “dot”. 


e The second period is a metacharacter that matches any single character on that line (in this instance 
the first character of the line). 


e The third period is the only one that really is an honest literal period. On the right side of a substitution, 
the period is not special. 


2.4.2 Backslash 


Since a period means “any character”, the question arises of what to do when a period is really needed. For 
example, to convert the line: 


Now is the time. 
into 

Now is the time? 
the backslash (\) is used. A backslash turns off any special meaning that the next character might have. In par- 
ticular, \. converts the period from a “match anything” into a “match the period” statement. The \. pair of char- 
acters is considered by ed to be a single literal period. To replace the period with a question mark, the following 
command is used: 


s/\./2/ 


The backslash can also be used when searching for lines that contain a special character. If a search is made 
to look for a line that contains 


.PP 


Page 18 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


the search 
LEY? 

is not adequate. It will find a line like 
THE APPLICATION OF ... 

The period matches the letter “A”. But if the command 
/\.PP/ 

is used, only the lines that contain.“.PP” are found. 


The backslash can also be used to turn off special meanings for characters other than the period. For exam- 
ple, to find a line that contains a backslash, the search 


/\/ 


will not work because the \ is not a literal backslash, but instead means that the second / no longer delimits 
the search. A search can be made for a literal backslash by preceding a backslash with another \; 


/\\/ 
Similarly, searches can be made for a slash (/) with 
/\// 


The backslash turns off the meaning of the immediately following / so that it does not terminate /.../ construc- 
tion prematurely. 


Some substitute commands, each of which will convert the line 


\x\.\y 
into the line 

\x\y 
are 

8/\\\.// 

3/x../x/ 

s/..y/y/ 


The user’s erase character and the line kill character (# and @ by default) must also be used with a 
backslash to turn off their special meaning. This is a feature of the UNIX operating system. When adding text 
with append (a), insert (i), or change (c) commands, the backslash is special only for the erase and line kill char- 
acters, and only one backslash should be used for each one needed. 


2.4.3 Dollar Sign 


In the left side of a substitute command or in a search command the dollar sign ($) stands for “the end of 
line”. The word “time” is added to the end of the following phrase. 


Page 19 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


Now is the 

with the following command: 
s/$/<sp>time/ 

The result is 
Now is the time 

A space is needed before “time” in the substitute command, otherwise, the following will be printed. 
Now is thetime 

The second comma in the following line can be replaced with a period without altering the first. 
Now is the time, for all good men, 

The needed command is 
s/,$/./ 


The $ provides context to indicate which specific comma. Without it the s command would operate on the first 
comma to produce 


Now is the time. for all good men, 
To convert 
Now is the time. 
into 
Now is the time? 
that was previously done with the backslash, the following command is used: 


3/.$/2/ 
The $ has multiple meanings depending on context. In the line 


$s/$/$/ 
e The first $ refers to the last line of the file. 
e The second $ refers to the end of the line. 
e The third $ is a literal dollar sign to be added to that line. 
2.4.4 Circumflex 


The circumflex (*), alias “hat” or “caret”, stands for the beginning of the line. For example, if a search is 
made for a line that begins with “the”, the command 


/the/ 


Page 20 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 
will in all likelihood find several lines that contain “the” before arriving at the line that was wanted. But the 
command 
/*the/ 
narrows the context, and thus arrives at the desired line more easily. 
The other use of * is to enable context to be inserted at the beginning of a line. 
s/*/<sp>/ 


places a space at the beginning of the current line. 


Metacharacters can be combined. For example, to search for a line that contains only the characters 
PP 
the command 
/°*\.PP$/ 
can be used. 
2.4.5 Star 
The star (*) is useful to replace all spaces between x and y with a single space, as in the following example: 
text x y text 


where text stands for lots of text, and there are an indeterminate number of spaces between x and y. The line 
is too long to retype, and there are too many spaces to count. 


A regular expression (typically a single character) followed by a star stands for as many consecutive occur- 
rences of that regular expression as possible. To refer to all the spaces at once, the following command is used: 


s/x<sp>*y/x<sp>y/ 


The construction <sp>* means “as many spaces as possible”. Thus x<sp>*y means: “an x, followed by as many 
spaces as possible, and then a y”. 


The star can be used with any character, not just space. If the original example was 
text x-------- y text 
then all “-” characters can be replaced by a single space with the command 
s/x-*y/x<sp>y/ 
If the original line was 
LOX y text 
and if the following command was typed: 


s/x.*y/x<sp>y/ 


Page 21 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


what happens depends upon the occurrence of other x’s or y’s on the line. If there are no other x’s or y’s, then 
everything works, but it is blind luck, not good management. Since a period matches any single character, then 
.* matches as many single characters as possible. Unless the user is careful the star can eat up a lot more of 
the line than expected. If the line was 

text x text x:..:.;..: y text y text 


then the command will take everything from the first “x” to the last “y”, which, in this example, is undoubtedly 
more than wanted. The proper way is to turn off the special meaning of period with \.: 


3/x\.*y/x<sp>y/ 

Now everything works since \.* means “as many periods as possible”. 

There are times when the pattern .* is exactly what is wanted. For example, to change 

Now is the time for all good men ... 

into 
Now is the time. 

the following deletes everything after the word “time”: 
s/<sp>for.*/./ 

There are a couple of additional pitfalls associated with * to be aware of. Most notable is that “as many 
as possible” means zero or more. The fact that zero is a legitimate possibility is sometimes rather surprising. 
For example, if this line contained 

text xy text x y text 
and the command is 
3/x<sp>*y/x<sp>y/ 


the first “xy” matches this pattern, for it consists of an “x”, zero spaces, and a “y”. The result is that the substi- 
tute acts on the first “xy” and does not touch the later one that actually contains some intervening spaces. 


The proper way is to specify a pattern like 
/x<sp><sp>*y/ 
which says “an x, a space, as many more spaces as possible, and then a y” (in other words, one or more spaces). 


The other startling behavior of * is also related to the zero being a legitimate number of occurrences of some- 
thing followed by a star. The command 


3/x*/y/g 
when applied to the line 
abedef 


Page 22 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


produces 

yaybycydyeyfy 
which is almost certainly not what was intended. The reason for this behavior is that zero is a legal number 
of matches, and there is no “x” at the beginning of the line (so that gets converted into a “y”), nor between the 
“a” and the “b” (so that gets converted into a “y’’), etc. The following command: 

s/xx*/y/g 
where “xx*” is “one or more x’s”, when applied to the line 

abedefxghi 
produces 

abedefyghi 


2.4.6 Brackets 


Should a number that appears at the beginning of all lines of a file need to be deleted, a first thought might 
be to perform a series of commands like: 


1,$s/*1*// 
1,$8/*2*// 
1,$8/*3*// 


This is going to take forever if the numbers are long. Unless it is desired to repeat the commands over and over 
until finally all numbers are gone, the digits can be deleted on one pass. This is the purpose of brackets ({ }). 


The construction 
[0123456789] 


matches any single digit. The whole thing is called a “character class”. With a character class, the job is easy. 
The pattern “[0123456789]*” matches zero or more digits (an entire number), so 


1,$s/* [0123456789] *// 
deletes all digits from the beginning of all lines. 


Any characters can appear within a character class; and just to confuse the issue, there are essentially no 
special characters inside the brackets. Even the backslash does not have a special meaning. The following com- 
mand searches for special characters within the brackets: 


/L\$*[ 7 


Within a character class, the [ is not special. To get a ] into a character class, it should be placed as the first 
character in the class. For example: 


/L JASE / 


It is a nuisance to have to spell out the digits. They can be abbreviated as [0-9]; similarly, [a-z] stands for 
the lowercase letters and [A-Z] for uppercase letters. 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


The user can specify a character class that means “none of the following characters”. This is done by begin- 
ning the class with a circumflex. 


[*0-9] 


which stands for “any character except a digit”. The following search finds the first line that does not begin 
with a tab or space: 


/*[*(space)(tab)]/ 


Within a character class, the circumflex has a special meaning only if it occurs at the beginning. For exam- 
ple: 


Preys 


finds a line that does not begin with a circumflex. 


2.4.7 Ampersand 
The ampersand (&) is used primarily to save typing. For example, if the following is the original line: 
Now is the time 
and it needs to be 
Now is the best time 


the command 


3/the/the best/ 


can be used, but it is unecessary to repeat the “the”. The & is used to eliminate the repetition. On the right-hand 
side of a substitute command, the ampersand means “whatever was just matched”, so in the command 


s/the/& best/ 


the & represents “the”. This is not much of a saving if the text matched is just “the”; but if it is something long 
or complicated or if it is something (such as .*) which matches a lot of text, the & can save some tedious typing. 
There is also much less chance of making a typing error in the replacement text. For example, to parenthesize 
a line, regardless of its length: 


s/.*/(&)/ 


The ampersand can occur more than once on the right side. 


s/the/& best and & worst/ 
makes the original line 


Now is the best and the worst time 


and 


8/.*/&? &!/ 


Page 24 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


converts the original line into 
Now is the time? Now is the time!! 
To get a literal ampersand, the backslash is used to turn off the special meaning. 
s/ampersand/\&/ 
converts the word into the symbol. The & is not special on the left-hand side of a substitute, only on the right. 
3. Operating On Lines 
3.1 Substituting Newline Characters 
The ed program provides a facility for splitting a single line into two or more lines by substituting in a 
newline character. If a line is unmanageably long because of editing or merely because of the way it is typed, 
it can be divided as follows: 
text xy text 
can be broken between the “x” and the “y” with the following substitute command: 


s/xy/x\ 
y/ 


This is actually a single command although it is typed on two lines. Bearing in mind that \ turns off special 
meanings, it seems relatively intuitive that a \ at the end of a line would make the newline character there no 


longer special. 


A single line can be made into several lines with this same mechanism. The word “very” in the following 
example can be put on a separate line preceded with the nroff formatter underline command (.ul): 


text a very big text 
The commands 
s/<sp>very<sp>/\ 


-ul\ 
very\ 
/ 


convert the line into four shorter lines: 


text a 
-ul 

very 
big text 


The word “very” is preceded by the line containing the “ul” and spaces around “very” are eliminated at the 
same time. 


When a new line is substituted in, dot is left pointing at the last line created. 


Page 25 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


3.2 Joining Lines 
Lines may be joined together with the j command. Given the lines 


Now is 
<sp>the time 


and if dot is set to the first line, then the j command joins them together. No spaces are added, which is why 
a space is shown at the beginning of the second line. 


All by itself, a j command joins dot to dot+1. Any contiguous set of lines can be joined by specifying the 
starting and ending line numbers. For example: 


1,$jp 
joins all the lines into a big one and prints it. 


3.3 Rearranging Lines 


The & metacharacter stands for whatever was matched by the left side of an s command. Similarly, several 
pieces can be captured of what was matched; the only difference is it must be specified on the left side just what 
_ pieces the user is interested in. For instance, if there is a file of lines that consist of names in the form 


Smith, A. B. 
Jones, C. 


etc., and it was intended to have the initials to precede the name, as in: 


A. B. Smith 
C. Jones 


it is possible to do this with a series of tedious and error-prone editing commands. The alternative is to “tag” 
the pieces of the pattern (in this case, the last name and the initials) and then rearrange the pieces. On the left 
side of a substitution if part of the pattern is enclosed between \( and \), whatever matched that part is remem- 
bered and available for use on the right side. On the right side, the symbol \1 refers to whatever matched the 


first \(...\) pair, \2 to the second \(...\) pair, ete. 
The command 
1,$s/*\([*,]*\),<sp>*\(.*\)/\2<sp>\I/ 


although hard to read, does the job. The first “ \(...\) ” matches the last name, which is any string up 
to the comma; this is referred to on the right side with “\1”. The second “\(...\)” is whatever 
follows the comma and any spaces and is referred to as “\2”. 


With any complicated editing sequence, it is foolhardy to run it and hope. Global commands (see paragraphs 
5.1 and 5.2) provide a way to print those lines affected by the substitute command. 


4. Line Addressing in Editor 
Line addressing in ed specifies the lines to be affected by editing commands. Previous constructions like 
1,$s/x/y/ 


were used to specify a change on all lines. Most users are familiar with using a single newline character (or re- 
turn) to print the next line and with 


Page 26 


\ ee eS: Mee Ge Gee Ber te Oe i em Mie il ee ie me 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


/string/ 
to find a line that contains “string”. Less familiar is the use of 
?string? 


to scan backwards for the previous occurrence of “string”. This is handy when the user realizes that the string 
to be operated on is back up the page (file) from the current line being edited. 


The slash and question mark are the only characters that can be used to delimit a context search. Essential- 
ly, any character can be used as a delimiter in a substitute command. 


4.1 Address Arithmetic 
The next step is to combine the line numbers like ., §, /.../, and ?...? with + and —. Thus: 
$-1 
is a command to print the next to last line of the current file (i.e., one line before line $). For example: 
$—5,$p 
prints the last six lines. If there are not six lines, an error message will be indicated. 
As another example: 
=—3-E3p 


prints from three lines before the current line to three lines after, thus printing a bit of context. The + can be 
omitted: 


30D 
is identical in meaning. 


Another area in which to save typing effort in specifying lines is by using — and + as line numbers by them- 
selves. For instance, a 


by itself is a command to move back up one line in the file. Several minus signs can be strung together to move 
back up that many lines: 


moves up three lines, as does “—3”. Thus: 
=3,to0 
is also identical to the examples above. 


“ ” 


Since is shorter than “.—1”, constructions like 


—,.8/bad/good/ 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


are useful. This changes the first occurrence of “bad” to “good” on both the previous line and the current line. 
The + and — can be used in combination with searches using /.../, ?...?, and $. The search 
/string/—— 
finds the line containing “string” and positions dot two lines before it. 
4.2 Repeated Searches 
When the search command is 
/horrible string/ 


and when the line is printed, it is discovered that it is not the horrible string that was wanted. It is necessary 
to repeat the search again, but it is not necessary to retype it. The construction 


// 


is a shorthand for “the string that was previously searched for”, whatever it was. This can be repeated as many 
times as necessary. This also applies to the backwards search 


2? 
which searches for the same string but in the reverse direction. 


Not only can the search be repeated, but the // construction can be used on the left side of a substitute com- 
mand to mean “the most recent pattern”: 


/horrible string/ 
—~—— ed prints line with “horrible string” 
s//good/p 
To go backwards and change a line, the following command is used: 
??s//good/ 
Of course, the & on the right-hand side of a substitute can still be used to stand for whatever got matched: 


//8//&<sp>&/p 


finds the next occurrence of whatever was searched for last, replaces it by two copies of itself, and then prints 
the line just to verify that it worked. 


4.3 Default Line Numbers 
One of the most effective ways to speed editing is by knowing which lines are affected by a command with 
no address and where dot will be positioned when a command finishes. Editing without specifying unnecessary 
line numbers can save a lot of typing. As the most obvious example, the search command 
/string/ 


puts dot at the next line that contains “string”. No address is required with commands like: 


e s to make a substitution on the line 


Page 28 


= a 


——S 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


ep eee the line 

e | to list the line 

e d to delete the line 

e a to append text after the line 

e c to change the line 

e i to insert text before the line. 

If there was no “string”, dot stays on the line where it was. This is also true if it was sitting on the only 

“string” when the command was issued. The same rules hold for searches that use ?...?; the only difference is 


direction of search. 


The delete command (d) leaves dot at the line following the last deleted line. However, dot points to the new 
last line when the last line is deleted. 


Line-changing commands a, ¢, and i affect (by default) the current line if no line number is specified. They 
behave identically in one respect—after appending, changing, or inserting, dot points at the last line entered. 
For example, the following can be done without specifying any line number for the substitute command or for 
the second append command: 


a 
Se bet 
——— botch (minor error) 


sAboteh/correct/: > Ufix hotehied lea) 
a 
——— more text 


The following overwrites the major error and permits continuation of entering information: 


a 
== text 
——— horrible botch (major error) 

¢ 
——— fixed up line (replace entire line) 
——— more text 


The read command (r) will read a file into the text being edited, either at the end if no address is given or 
after the specified line if an address is given. In either case, dot points at the last line read in. The Or command 
can be used to read in a file at the beginning of the text, and the Oa or 1i commands can be used to start adding 
text at the beginning. 


The write command (w) writes out the entire file. If the command is preceded by one line number, that line 
is written. Preceding the command by two line numbers causes a range of lines to be written, The w command 
does not change dot, therefore, the current line remains the same regardless of what lines are written. This is 
true even if a *ommand like 


/*\.AB/,/°\.AE/w abstract 
is made, which involves a context search. Since the w command is easy to use, the text being edited should be 
saved regularly just in case the system crashes or a file being edited is clobbered. 


Page 29 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


The command with the least intuitive behavior is the s command. The dot remains at the last line that was 
changed. If there were no changes, then dot is ——— To illustrate, if there are three lines in the buffer 
and dot is sitting on the middle one 


xl 
x2 
x3 


the command 
—,+s/x/y/p 
prints the third line, which is the last one changed. But if the three lines had been 


xl 
y2 
y3 


and the same command issued while dot pointed at the second line, then the result would be to change and print 
only the first line and that is where dot would be set. 


4.4 Semicolon 


Searches with /.../ and ?...? start at the current line and move forward or backward, respectively, until they 
either find the p'tern or get back to the current line. Sometimes this is not what is wanted. Suppose, for exam- 
ple, that the buffer contains lines like 


‘as 


be 


Starting at line 1, one would expect that the command 


/a/,/b/p 


hig ” 


prints all the lines from the “ab” to the “be”, inclusive. This is not what happens. Both searches (for and 
for “b”) start from the same point, and thus they both find the line that contains “ab”. The result is to print 
a single line. If there had been a line with a “b” in it before the “ab” line, then the print command would be 
in error since the second line number would be less than the first; and it is illegal to try to print lines in reverse 
order. This is because the comma separator for line numbers does not set dot while each address is processed. 
Each search starts from the same place. 


In ed, the semicolon (;) can be used just like the comma with the single difference being that use of a semico- 
lon forces dot to be set at that point while line numbers are being evaluated. In effect, the semicolon “moves” 
dot. Thus, in the example above, the command 


/a/;/b/p 


Page 30 


1/83 ADD ISSUE 1 DOCUMENT PROCESSING GUIDE 


prints the range of lines from “ab” to “be” because after the “a” is found dot is set to that line, and then “b” 
is searched for starting beyond that line. This property is most often useful in a very simple situation. If the 
need is to find the second occurrence of “string”, then the commands 


/string/ 
// 


print the first occurrence as well as the second. The command 
/string/;// 


finds the first occurrence of “string” and sets dot there. Then it finds the second occurrence and prints only 
that line. 


Searching for the second previous occurrence of “string”, as in 
?string?;?? 
is similar. Printing the third, fourth, etc. occurrence in either direction is left as an exercise. 


When searching for the first occurrence of a character string in a file where dot is positioned at an arbitrary 
place within the file, the command 


1;/string/ 
will fail if “string” occurs on line 1. It is possible to use the command 

0;/string/ 
(one of the few places where 0 is a legal line number) to start the search at line 1. 
4.5 Interrupting the Editor 


If the user interrupts ed while performing a command (by depressing the BREAK key, the INTERRUPT 
key, or the user interrupt character [RUB OUT or DEL CHAR keys by default]), the file is put back tegether 
again. The file state is restored as much as possible to what it was before the command began. Naturally, some 
changes are irrevocable. If the file is being read from or written into, substitutions are being made, or lines are 
being deleted, these will be stopped in some clean but unpredictable state in the middle of the command execu- 
tion (which is why it is not usually wise to stop them). Dot may or may not be changed. 


Printing is more clear cut. Dot is not changed until the printing is done. Thus, if a user interrupts ed while 
some printing is being done, dot is not sitting on the last printed line or even near it. Dot is returned to where 
it was when the p command was started. 


5. Global Commands 
5.1 Basic 


Global commands (g and v) are used to perform one or more editing commands on all lines of a file. The 
g command operates on those lines that contain a specified string. As the simplest example, the command 


g/THIS/p 
prints all lines that contain the string “THIS”. The string that goes between the slashes can be anything that 


could be used in a line search or in a substitute command; exactly the same rules and limitations apply. As an- 
other example: 


g/*\./p 


Page 31 


DOCUMENT PROCESSING GUIDE ADD ISSUE 1 1/83 


prints all lines that begin with a period. 


The v command (there is no mnemonic significance to the letter “v”) is identical to g, except that it operates 
on those lines that do not contain an occurrence of the string. So 


v/*\./p 
prints the lines that do not begin with a period. 
The command that follows g or v can be almost any command. For example: 
g/*\./d 
deletes all lines that begin with a period, and 
g/°$/d 
deletes all empty (blank) lines. 


Probably the most useful command that can follow a global command is the substitute command since this 
_ can be used to make a change and print each affected line for verification. For example, to change the word 
“This” to “THIS” everywhere in a file and verify that it really worked, the command is 


g/This/s//THIS/gp 


The use of // in the substitute command means “the previous pattern”, in this case, “This”. The p command 
is done on every line that matches the pattern, not just those on which a substitution took place. 


Global commands operate by making two passes over the file. On the first pass, all lines that match the 
pattern are marked. On the second pass, each marked line in turn is examined, dot is set to that line, and the 
command executed. This means that it is possible for the command that follows a g or v to use addresses, set 
dot, etc., quite freely. For example: 


g/*\.PP/+ 


prints the line that follows each “.PP” command (the signal for a new paragraph in some formatting packages). 
The + means “one line past dot”, and 


g/topic/?*\.SH?1 
searches for each line that contains “topic”, scans backwards until it finds a line that begins ‘.SH” (a section 
heading) and prints the line that follows, thus showing the section headings under which “topic” is mentioned. 
Finally: 

g/*\.EQ/+,/*\.EN/—p 


prints all the lines that lie between lines beginning with the “.EQ” and “.EN” formatting commands. 


The g and v commands can also be preceded by line numbers, in which case the lines searched are only those 
in the range specified. 


Page 32 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


5.2 Multiline 


It is possible to do more than one command under the control of a global command although the syntax for 


Cl ae 3 


expressing the operation is not especially natural or easy. As an example, suppose the task is to change “x” to 


ee, yy ee? 


y” and “a” to “b” on all lines that contain “string”. Then: 


g/string/s/x/y/\ 
s/a/b/ 


is sufficient. The backslash signals the g command that the set of commands continues on the next line. It termi- 
nates on the first line that does not end with \. A substitute command can not be used to insert a newline charac- 
ter within a g command. 


The command 


g/x/s//y/\ 
s/a/b/ 


does not work as expected. The remembered pattern is the last pattern that was actually executed, so sometimes 
it will be “x” (as expected) and sometimes it will be “a” (not expected). The desired pattern should be spelled 
out: 


g/x/s/x/y/\ 
s/a/b/ 


It is also possible to execute a, c, and i commands (append, change, and insert) under a global command. 
As with other multiline constructions, all that is needed is to add a \ at the end of each line except the last. 
Thus to add a .nf and .sp command before each “.EQ” line, the following is typed: 


g/*\.EQ/i\ 
nf\ 
Sp 


There is no need for a final line containing a period to terminate the i command unless there are further com- 
mands being done under the global. On the other hand, it does no harm to put it in. 


It is good practice, after each global command, to check that the command did only what was desired. Sur- 
prises sometimes happen. When they do occur, the u command (undo) is useful to negate what was done by the 
last command. 

6. Cut and Paste 

One editing area in which nonprogrammers do not seem confident is the “cut and paste” operations. There 
are two areas in which the operations can be performed. Using the UNIX operating system command functions, 
the following can be done: 

e Changing the name of a file 
e Making a copy of a file somewhere else 
e Combining files 


e Removing a file. 


The text editor (ed) function performs the following operations. 


Page 33 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


e Inserting one file in the middle of another 

e Splitting a file into pieces 

e Moving a few lines from one place to another in a file 
e Copying lines. 


Most of these operations are actually quite easy if the task is defined and precautions are taken when entering 
the commands. 


6.1 Command Functions 


Changing file names, making copies of files, combining files, and removing files are handled with the UNIX 
operating system commands. 


6.1.1 Changing Name of Files 


If there is a file named oldname and if it needs to be renamed to newname, the move command (my) will 
do the job. It moves the file from one name to another (the target file), for example 


mv oldname newname 
Note: If there is already a file with the new name, its contents will be overwritten with information 
from the other (oldname) file. The one exception is that a file cannot be moved to itself; therefore, the fol- 
lowing command is illegal. 
mv oldname oldname 
6.1.2 Copying Files 
Sometimes a copy of a file is needed while retaining the original file. This might be because a file needs to 
be worked on and yet have a back-up in case something happens to the file. In any case, the copy is made with 
the copy command (ep). To make a copy of a file named good, the following command will place a copy in a file 
named savegood: 
ep good savegood 
Two identical copies of the file good exist. If savegood previously contained something, it is overwritten. 
To get the file savegood back to its original filename, good, the following commands are used; 
mv savegood good 
if savegood is not needed anymore or 
ep savegood good 


to retain a copy of savegood. 


In summary, mv renames a file; ep makes a duplicate copy. Both commands overwrite the target file if one 
already exists unless write permission is denied by the mode of the file. 


Page 34 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


6.1.3 Combining Files 
A familiar requirement is that of collecting two or more files into one big file, bigfile. This is needed, for 
example, when the author of a paper decides that several sections are to be combined. There are several ways 
to do this; the cleanest is a command called cat (not all commands have 2-letter names). The word eat is short 
for “concatenate”, which is exactly what is desired. The command 
cat file 
prints the contents of the file on the terminal. The command 


eat filel file2 


causes the contents of file! and file2 to be printed on the terminal, in that order, but does not place them in 
bigfile. 


There is a way to tell the system to put the same information in a file instead of printing on the terminal. 
The way to do it is to add to the command line the > character and the name of the file where the output is 
to go. The command ; 

cat file! file? > bigfile 
is used and the job is done. As with ep and mv, when something is put into bigfile, anything already there is 
destroyed. The ability to capture the output of a program can be used with any command that prints on a termi- 
nal. Several files can be combined, not just two. 

eat filel file? file3 ... > bigfile 
collects many individual files. 


Sometimes a file needs to be appended to the end of another file. For example: 


cat good good! > temp 
mv temp good 


is the most direct way. The following command: 
cat good good! > good 
does not work because the > empties good before the cat program begins. The easiest way is to use a variant 
of >, called >>. In fact, >> is identical to > except that instead of clobbering the old file it adds something to 
the end. Thus the command 
eat good! >>good 
adds good1 to the end of good. If good does not exist, this makes a copy of good1 called good. 
6.1.4 Removing Files 
If a file is not needed, it can be removed. The rm command 


rm savegood 


irrevocably deletes the file called savegood if the user had write permission. 


Page 35 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


6.2 Text Editor Functions 
Manipulating pieces of files, individual lines, or groups of lines are handled with the text editor. 
6.2.1 File Names 


It is important to know the editor (ed) commands for reading and writing files. Equally useful is the edit 
command (e). Within ed, the command 


e newtile 
says “edit a new file called newfile without leaving the text editor”. The e command discards whatever is being 
worked on and starts over on newfile. This is the same as if one had quit with the q command and reentered 
ed with a new file name except that if a pattern has been remembered, a command like // will still work. 
When entering ed with the command 


ed file 


ed remembers the name of the file, and any subsequent e, r, or w commands that do not contain a file name 
will refer to this remembered file. Thus: 


ed filel 
aS (editing) 

w (writes back in file1) 

e file2 (edit different file, without leaving ed) 
SES (editing on file2) 

w (writes back on file2) 


etc., does a series of edits on various files without leaving ed and without typing the name of any file more than 
once. By examining the sequence of commands in this example, it can be seen why many operating systems use 
e as a synonym for ed. 


The current file name can be found at any time with the f command by typing f without a file name. Also, 
the name of a remembered file can be changed with f. A useful sequence is 


ed precious 
f junk 
SSS (editing) 


This obtains a copy of the file precious and guarantees that a subsequent w command without a filename will 
write to junk and will not overwrite the original file. 


6.2.2 Inserting One File Into Another 


When a file is to be inserted into another, the r command can be used. For example, if the file table is to 
be inserted just after the reference to “Table 1”, the following can be used: 


/Table 1/ 
Table 1 shows that... (response from ed) 
I table 


The critical line is the last one. The .r command reads a file in after dot. An r command without any address 
adds lines to the end of the file, so it is equivalent to the $r command. 


Page 36 


| 
. 
/ 
| 
| 
| 
| 
3 
| 
| 
| 
| 
, 
: 
: 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


6.2.3 Writing Out Part of a File 


Another feature is writing to another file part of the document that is being edited. For example, it is possi- 
ble to split into a separate file the table from the previous example, so it can be formatted or tested separately. 
If in the file being edited, there is 


Bas CORE 
TS 

———~ lots of stuff 
TE 

Spex 


(which is the way a table is set up (as explained in Section ITI) to isolate the table in a separate file called table, 
first the start of the table (the .TS line) is found, and then the interesting part is written on file table: 
Z2N.18/ 
.TS (response from ed) 
._/°\.TE/w table 
The same job can be accomplished with the single command 
/*\.TS/;/°\.TE/w table 


The point is that the w command can write out a group of lines instead of the whole file. A single line can 
be written by using one line number instead of two. For example, if a complicated line was just typed and it 
will be needed again, it should be saved and read in later rather than retyped: 


a 
=== lots Of - stint 
——— stuff to repeat 
-w temp 
a 
——— more stuff 
r temp 
a 


——— more stuff 


6.2.4 Moving Lines Around 


Moving a paragraph from its present position in a paper to the end can be done several ways. For example, 
it is assumed that each paragraph in the paper begins with the formatting command “.PP”. The brute force 
way (not necessarily bad) is to write the paragraph onto a temporary file, delete it from its current position, 
and then read in the temporary file at the end. If dot is at the “.PP” command that begins the paragraph, this 
is the sequence of commands: 


./°\.PP/—w temp 
f/—d 
$r temp 


This states that from where dot is now until one line before the next “.PP” write onto file temp. The same lines 
are deleted and the file temp is read in at the end of the working file. 


Page 37 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


An easier way is to use the move command (m) that ed provides. This does the whole set of operations at 
one time without a temporary file. The m command is like many other ed commands in that it takes up to two 
line numbers in front to tell which lines are to be:affected. It is also followed by a line number that tells where 
the lines are to go. Thus: 


linel,line2m line’ 


says “move all the lines from line1 through line2 to after line3”. Any of “linel”, etc., can be patterns between 
slashes, dollar signs, or other ways to specify lines. If dot is at the first line of the paragraph, the command 


af *\.FP7=ms 
will also accomplish this task. 


As another example of a frequent operation, the order of two adjacent lines can be reversed by moving the 
first one after the second. If dot is positioned at the first line, then 


m+ 

does it. It says to move the line to ater the dot. If dot is positioned on the second line: 
oe ee 

does the interchange. 


The m command is more concise and direct than writing, deleting, and rereading. The main difficulty with 
the m command is that if patterns are used to specify both the line being moved and the target line, they must 
be specified properly or the wrong lines may be moved. The result of a botched m command can be a costly mis- 
take. Doing the job a step at a time makes it easier to verify that each step accomplished what was wanted. 
It is also a good idea to issue a w command before doing anything complicated; then if an error is made, it is 
easy to back up. 


6.2.5 Copying Lines 


The ed program provides a transfer command (t)for making a copy of a group of one or more lines at any 
point. This is often easier than writing and reading. The t command is identical to the m command except in- 
stead of moving lines it duplicates them at the place referenced. Thus: 


1,$t$ 


duplicates the entire contents that is being edited. A more common use for t is creating a series of lines that 
differ only slightly. For example: 


a 

——— long line of stuff 
t. (make a copy) 
s/x/y/ (change it a bit) 
b. (make third copy) 
s/y/2z/ (change it a bit) 


6.2.6 Marks 


The ed program provides for marking a line with a particular name so that the line can be referenced later 
by its name regardless of its line number. This can be useful for moving lines and for keeping track of them 
as they move. The mark command is k. The mark name must be a single lowercase letter. The command 


Page 38 


6/82 ISSUE 1 DOCUMENT PROCESSING GUIDE 


kx 


marks the current line with the name “x”. If a line number precedes the k, that line is marked. The marked 
line can then be referred to with the address 


x 


Marks are most useful for moving things around. The first line of the block to be moved is found and marked 
with ka. Then the last line is found and marked with kb. Dot is then positioned at the place where the lines 
are to go and the following command is performed: 


’a, bm. 
Note: Only one line can have a particular mark name associated with it at any given time. 


6.3 Temporary Escape 


Sometimes it is convenient to temporarily escape from the text editor to do some UNIX operating system 
command without leaving the text editor. The escape command (!) provides a way to do this. If the command 


!<any UNIX operating system command> 


is entered, the current editing state is suspended; and the command asked for is executed. When the command 
finishes, ed will return a signal by printing another ! and editing can be resumed. 


Any UNIX operating system command may be performed including another ed (this is quite common). In 
this case, another ! can be done. 


7. Supporting Tools 

There are several related tools and techniques which are relatively easy to learn after ed has been learned 
because they are based on ed. This section gives some cursory examples of these tools, more to indicate their 
existence than to provide a complete tutorial. 
7.1 Global Printing From a Set of Files (grep) 

Sometimes all occurrences of some word or pattern in a set of files need to be found in order to edit them 
or perhaps to verify their presence or absence. It may be possible to edit each file separately and look for the 
pattern of interest. If there are many files, this can be tedious; and if the files are really big, it may be impossible 


because of limits in ed. 


The grep program was written to get around these limitations. Search patterns described in this section 
are often called “regular expressions”, and “grep” stands for 


g/re/p 
This describes what grep does—it prints every line in a set of files that contains a particular pattern. Thus: 
grep ’string’ filel file2 file3 ... 


finds “string” wherever it occurs in any of the files filel, file2, etc. The grep program also indicates the file 
in which the line was found, so it can be edited later if needed. 


The pattern represented by “string” can be any pattern that can be used in the text editor since grep and 
ed use the same mechanism for pattern searching. It is wisest to enclose the pattern in single quotes (’...’) if 


Page 39 


DOCUMENT PROCESSING GUIDE ADD ISSUE 1 1/83 


it contains any nonalphabetic characters since many such characters also mean something special to the UNIX 
operating system command interpreter (the “shell’”). Without single quotes, the command interpreter will try 
to interpret them before grep has the opportunity. 


There is also a way to find lines that do not contain a pattern: 
grep —v ’string’ filel file? ... 


finds all lines that do not contain “string”. The —v must occur in the position shown. Given grep and grep 
—vy, it is possible to select all lines that contain some combination of patterns. For example, to obtain all lines 


co, 


that contain “x” but not “y”: 
grep x file... | grep —v y 
The pipe notation (1) causes the output of the first command to be used as input to the second command. 


7.2 Editing Scripts 


If a fairly complicated set of editing operations is to be performed on an entire set of files, the easiest 
thing to do is to make a script file, i.e., a file that contains the operations to be performed and then apply 
this script to each file in turn. For example, if every instance of “This” needs to be changed to “THIS” and 
every instance of “That” needs to be changed to “THAT” in a large number of files, a file script is made 
with the following contents: 


2/This/s//THIS/g 
g/That/s//THAT/g 
Ww 

q 


The following is done: 


ed filel <script 
ed file2 <script 


This causes ed to take its commands from the prepared script. The whole job has to be planned in advance. 


By using the UNIX operating system command interpreter [sh(1)], a set of files can be cycled automatically 
with varying degrees of ease. 


